feat: sandcastle refinement loop with critic-based convergence by jerome-benoit · Pull Request #111 · jerome-benoit/sap-ai-provider

jerome-benoit · 2026-05-04T23:26:15Z

Description

Replace single-pass implement→review→merge with a modular iterative implement↔critic refinement loop. Each task gets its own parallel sandbox with convergence detection, quality ratchet, and automated PR creation.

Architecture

Planner (opus) → selects issues
  For each issue (parallel, max 3):
    Sandbox with implement↔critic loop:
      Implementer (sonnet) → codes + commits + pushes
      Critic (sonnet) → structured findings JSON (nonce-tagged)
      Dedup (context-hash) → convergence check
      Quality ratchet → rollback on regression
      Best-state checkpoint → restore optimal intermediate
      Validation-in-loop (ARCS) → deterministic convergence
    Finalize: validate → rebase → PR (draft if non-converged)

Key Design Decisions

Flat iteration budget (50/round) — evidence: ARCS, SWE-Agent, AutoCodeRover all use flat
Context-hash dedup (±3 lines SHA-256) — drift-safe, CodeQL/Qodana pattern
Severity-weighted convergence — refuses convergence if CRITICAL/HIGH persist (OpenHands)
Best-state tracking — resets to best intermediate on non-convergence (SWE-Agent)
Validation-in-loop — deterministic convergence when tests pass (ARCS)
Async subprocess execution — util.promisify(execFile) unblocks event loop for true parallelism
Nonce-tagged critic output — prevents injection from code content
One PR per task — no batch merge, each issue gets its own PR

Modules

File	Lines	Responsibility
`constants.ts`	64	Shared constants + `execFileAsync` + `getHeadSha` + `toErrorMessage`
`types.ts`	83	Zod schemas + exported interfaces + `parseFindingsSafe`
`concurrency-pool.ts`	69	O(1) FIFO semaphore (linked list)
`task-source.ts`	248	`TaskSource` interface + `GithubIssueSource` (fetch + sanitize + plan)
`refinement-loop.ts`	580	Core loop: implement↔critic + dedup + ratchet + convergence
`finalizer.ts`	281	Validate + retry + rebase + push + PR creation
`main.ts`	103	Thin orchestrator: discover → pool → loop → finalize

Prompts

Prompt	Role	Key rules
`plan-prompt.md`	Issue selection	Prefer single-file scope, exclude blocked
`implement-prompt.md`	Code + commit + push	Cross-validate findings, full validation before push
`critic-prompt.md`	Structured review	≤5 HIGH/CRIT findings, nonce-tagged JSON, known decisions blocklist

Type of Change

New feature (non-breaking change that adds functionality)
Refactoring (no functional changes)

Checklist

I have run npm run type-check && npm run test && npm run prettier-check && npm run lint
I have run npm run build && npm run check-build && npm run build:v2 && npm run check-build:v2
My changes follow the existing code style
E2E tested locally (planner + parallel implementers started successfully)

Related Issues

Fixes #110

Replace single-pass implement→review→merge with iterative implement↔critic loop per task. Key changes: - Orchestrator fetches and sanitizes issues (prevents prompt injection) - Implement↔Critic loop with deterministic dedup convergence - Critic produces structured findings (nonce-tagged JSON, zod-validated) - Decreasing iteration budget per round [100, 50, 25, 10, 10] - Host-side validation and rebase (no agent needed) - One PR per task (no merger agent) - Draft PR on non-convergence with outstanding findings listed Implements #110

Copilot

Pull request overview

This PR updates the Sandcastle automation workflow to replace the prior single-pass implement→review→merge flow with an iterative implement↔critic refinement loop, aiming for deterministic convergence based on deduplicated structured findings, followed by host-side validation and PR creation.

Changes:

Pre-fetch and sanitize GitHub issues in the orchestrator, passing issue data into the planner/implementer prompts instead of shell-expanding gh calls inside prompts.
Add a new Critic agent prompt + parsing/dedup logic to iterate implement→critic rounds until no new findings are produced (or a hard cap is reached).
Remove the separate review/merge prompt phases and move validation + PR creation to host-side execSync calls.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`.sandcastle/main.ts`	Implements the implement↔critic loop, findings parsing/dedup, host-side validation, rebase/push, and PR creation.
`.sandcastle/plan-prompt.md`	Switches planner input from `gh issue list ...` shell expansion to injected `{{ISSUES_JSON}}`.
`.sandcastle/implement-prompt.md`	Switches issue input to injected `{{ISSUE_BODY}}` and adds `{{FINDINGS}}` as refinement input.
`.sandcastle/critic-prompt.md`	New prompt defining nonce-tagged JSON findings output for the critic agent.
`.sandcastle/review-prompt.md`	Removed (review agent phase eliminated).
`.sandcastle/merge-prompt.md`	Removed (merge agent phase eliminated).

@@ -115,99 +180,186 @@ for (let iteration = 1; iteration <= MAX_PLANNER_RETRIES; iteration++) {
          sandbox: docker({ imageName: DOCKER_IMAGE }),
        });


+function parseFindings(stdout: string, nonce: string): Finding[] | null {
+  const tagPattern = new RegExp(`<findings-${nonce}>([\\s\\S]*?)<\\/findings-${nonce}>`, "g");
+  const matches = [...stdout.matchAll(tagPattern)];
+  const raw = matches.at(-1)?.[1]?.trim() ?? "[]";


+          const prTitle = `fix: resolve #${issue.id} — ${issue.title}`;
+          const prBody = `## Description\n\nAutomated fix for #${issue.id}: ${issue.title}\n\n## Type of Change\n\n- [x] Bug fix (non-breaking change that fixes an issue)\n\n## Checklist\n\n- [x] I have run validation suite\n- [x] My changes follow the existing code style\n\n## Related Issues\n\nFixes #${issue.id}${outstandingNote}${validationNote}`;
+
+          try {
+            execSync(
+              `gh pr create${draftFlag} --head "${issue.branch}" --base main --title "${prTitle}" --body "${prBody.replace(/"/g, '\\"')}"`,
+              { cwd, stdio: "pipe" },
+            );


+
+          const prTitle = `fix: resolve #${issue.id} — ${issue.title}`;
+          const prBody = `## Description\n\nAutomated fix for #${issue.id}: ${issue.title}\n\n## Type of Change\n\n- [x] Bug fix (non-breaking change that fixes an issue)\n\n## Checklist\n\n- [x] I have run validation suite\n- [x] My changes follow the existing code style\n\n## Related Issues\n\nFixes #${issue.id}${outstandingNote}${validationNote}`;


+  let issues: { body: string; branch: string; id: string; title: string }[];
  try {
    const parsed = JSON.parse(planContent) as { issues: unknown[] };
    if (!Array.isArray(parsed.issues)) {
-      console.error("Planner output missing issues array. Skipping iteration.");
+      console.error("Planner output missing issues array. Retrying.");
      continue;
    }
    const validated = parsed.issues.filter(
-      (entry): entry is { branch: string; id: string; title: string } => {
-        if (typeof entry !== "object" || entry === null) {
-          console.warn("  Skipping non-object issue entry");
-          return false;
-        }
+      (entry): entry is { body: string; branch: string; id: string; title: string } => {
+        if (typeof entry !== "object" || entry === null) return false;
        const item = entry as Record<string, unknown>;
-        if (typeof item.id !== "string" || !/^\d+$/.test(item.id)) {
-          console.warn(`  Skipping issue with invalid id: ${String(item.id)}`);
-          return false;
-        }
-        if (typeof item.branch !== "string") {
-          console.warn("  Skipping issue with missing branch");
-          return false;
-        }
-        if (typeof item.title !== "string") {
-          console.warn("  Skipping issue with missing title");
-          return false;
-        }
-        if (!BRANCH_PATTERN.test(item.branch)) {
-          console.warn(`  Skipping issue with invalid branch: ${item.branch}`);
-          return false;
-        }
+        if (typeof item.id !== "string" || !/^\d+$/.test(item.id)) return false;
+        if (typeof item.branch !== "string" || !BRANCH_PATTERN.test(item.branch)) return false;
+        if (typeof item.title !== "string") return false;
        return true;


…ting, zod validation) - Replace execSync with execFileSync for gh pr create (prevents shell injection) - Guard parseFindings against empty matches (prevents false convergence) - Add try/catch on gh issue list startup call - Guard git push in rebase catch block - Extract finalizeIssue function (reduces nesting from 6+ to 3 levels) - Add zod schema for rawIssues (replaces unsafe 'as' cast) - Implement validation retry round per spec (one more implement→critic if budget remains)

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 8 comments.

+  // --- Validation retry round (fix #7) ---
+  if (!validationPassed && round < MAX_CRITIC_ROUNDS) {
+    const retryBudget = ITERATION_BUDGET[MAX_CRITIC_ROUNDS - 1] ?? 10;
+    console.log(
+      `  #${issue.id}: Retrying one more implement→critic round (budget: ${String(retryBudget)})`,
+    );
+
+    try {
+      await sandbox.run({
+        agent: sandcastle.opencode("github-copilot/claude-sonnet-4.6"),
+        maxIterations: retryBudget,
+        name: `Implementer #${issue.id} retry`,
+        promptArgs: {
+          BRANCH: issue.branch,
+          FINDINGS: lastFindings.length > 0 ? JSON.stringify(lastFindings, null, 2) : "",
+          ISSUE_BODY: issue.body,
+          ISSUE_TITLE: issue.title,
+          TASK_ID: issue.id,
+        },
+        promptFile: "./.sandcastle/implement-prompt.md",
+      });


+  let issues: { body: string; branch: string; id: string; title: string }[];
  try {
    const parsed = JSON.parse(planContent) as { issues: unknown[] };
    if (!Array.isArray(parsed.issues)) {
-      console.error("Planner output missing issues array. Skipping iteration.");
+      console.error("Planner output missing issues array. Retrying.");
      continue;
    }
    const validated = parsed.issues.filter(
-      (entry): entry is { branch: string; id: string; title: string } => {
-        if (typeof entry !== "object" || entry === null) {
-          console.warn("  Skipping non-object issue entry");
-          return false;
-        }
+      (entry): entry is { body: string; branch: string; id: string; title: string } => {
+        if (typeof entry !== "object" || entry === null) return false;
        const item = entry as Record<string, unknown>;
-        if (typeof item.id !== "string" || !/^\d+$/.test(item.id)) {
-          console.warn(`  Skipping issue with invalid id: ${String(item.id)}`);
-          return false;
-        }
-        if (typeof item.branch !== "string") {
-          console.warn("  Skipping issue with missing branch");
-          return false;
-        }
-        if (typeof item.title !== "string") {
-          console.warn("  Skipping issue with missing title");
-          return false;
-        }
-        if (!BRANCH_PATTERN.test(item.branch)) {
-          console.warn(`  Skipping issue with invalid branch: ${item.branch}`);
-          return false;
-        }
+        if (typeof item.id !== "string" || !/^\d+$/.test(item.id)) return false;
+        if (typeof item.branch !== "string" || !BRANCH_PATTERN.test(item.branch)) return false;
+        if (typeof item.title !== "string") return false;
        return true;
      },
    );
-    issues = validated;
+    // Attach sanitized body from our fetched data
+    issues = validated.map((v) => ({
+      ...v,
+      body: issuesJson.find((i) => String(i.number) === v.id)?.body ?? "",
+    }));


+  try {
+    execSync(
+      "npm run type-check && npm run test && npm run test:node && npm run test:edge && npm run prettier-check && npm run lint && npm run build && npm run check-build && npm run build:v2 && npm run check-build:v2",
+      { cwd, stdio: "pipe" },


+  try {
+    execSync(
+      "npm run type-check && npm run test && npm run test:node && npm run test:edge && npm run prettier-check && npm run lint && npm run build && npm run check-build && npm run build:v2 && npm run check-build:v2",
+      { cwd, stdio: "pipe" },
+    );
+    validationPassed = true;
+  } catch {
+    console.warn(`  #${issue.id}: Validation failed.`);
+  }


+
+  const prTitle = `fix: resolve #${issue.id} — ${issue.title}`;
+  const prBody = `## Description\n\nAutomated fix for #${issue.id}: ${issue.title}\n\n## Type of Change\n\n- [x] Bug fix (non-breaking change that fixes an issue)\n\n## Checklist\n\n- [x] I have run validation suite\n- [x] My changes follow the existing code style\n\n## Related Issues\n\nFixes #${issue.id}${outstandingNote}${validationNote}`;


+  const prTitle = `fix: resolve #${issue.id} — ${issue.title}`;
+  const prBody = `## Description\n\nAutomated fix for #${issue.id}: ${issue.title}\n\n## Type of Change\n\n- [x] Bug fix (non-breaking change that fixes an issue)\n\n## Checklist\n\n- [x] I have run validation suite\n- [x] My changes follow the existing code style\n\n## Related Issues\n\nFixes #${issue.id}${outstandingNote}${validationNote}`;


+  // Rebase on latest main
+  let rebaseSucceeded = false;
+  try {
+    execSync("git fetch origin main && git rebase origin/main", {
+      cwd,
+      stdio: "pipe",
+    });
+    rebaseSucceeded = true;
+    if (validationPassed) {
+      // Post-rebase smoke test
+      try {
+        execSync("npm run type-check && npm run test", {
+          cwd,
+          stdio: "pipe",
+        });
+      } catch {
+        validationPassed = false;
+      }
+    }
+  } catch {
+    // Rebase failed — abort and push un-rebased
+    try {
+      execSync("git rebase --abort", { cwd, stdio: "pipe" });
+    } catch {
+      /* empty */
+    }
+    try {
+      execSync("git push", { cwd, stdio: "pipe" });
+    } catch (pushErr: unknown) {
+      const pushMsg = pushErr instanceof Error ? pushErr.message : String(pushErr);
+      console.warn(`  #${issue.id}: git push failed after rebase abort: ${pushMsg}`);
+    }
+  }


+/**
+ * @param text - Raw text to strip injection-prone tags from.
+ * @returns Sanitized text safe for prompt injection.
+ */
+function sanitizeForPrompt(text: string): string {
+  return text.replace(/<\/?(?:plan|findings[\w-]*|promise)[^>]*>/gi, "");
+}


… PR)

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

+              converged = false;
+            } else {
+              converged = true;
+            }


    const validated = parsed.issues.filter(
-      (entry): entry is { branch: string; id: string; title: string } => {
-        if (typeof entry !== "object" || entry === null) {
-          console.warn("  Skipping non-object issue entry");
-          return false;
-        }
+      (entry): entry is { body: string; branch: string; id: string; title: string } => {
+        if (typeof entry !== "object" || entry === null) return false;
        const item = entry as Record<string, unknown>;
-        if (typeof item.id !== "string" || !/^\d+$/.test(item.id)) {
-          console.warn(`  Skipping issue with invalid id: ${String(item.id)}`);
-          return false;
-        }
-        if (typeof item.branch !== "string") {
-          console.warn("  Skipping issue with missing branch");
-          return false;
-        }
-        if (typeof item.title !== "string") {
-          console.warn("  Skipping issue with missing title");
-          return false;
-        }
-        if (!BRANCH_PATTERN.test(item.branch)) {
-          console.warn(`  Skipping issue with invalid branch: ${item.branch}`);
-          return false;
-        }
+        if (typeof item.id !== "string" || !/^\d+$/.test(item.id)) return false;
+        if (typeof item.branch !== "string" || !BRANCH_PATTERN.test(item.branch)) return false;
+        if (typeof item.title !== "string") return false;
        return true;
      },
    );
-    issues = validated;
+    // Attach sanitized body from our fetched data
+    issues = validated.map((v) => ({
+      ...v,
+      body: issuesJson.find((i) => String(i.number) === v.id)?.body ?? "",
+    }));


+  try {
+    execSync(
+      "npm run type-check && npm run test && npm run test:node && npm run test:edge && npm run prettier-check && npm run lint && npm run build && npm run check-build && npm run build:v2 && npm run check-build:v2",
+      { cwd, stdio: "pipe" },
+    );
+    validationPassed = true;
+  } catch {
+    console.warn(`  #${issue.id}: Validation failed.`);
+  }


+      execSync("git push", { cwd, stdio: "pipe" });
+    } catch (pushErr: unknown) {
+      const pushMsg = pushErr instanceof Error ? pushErr.message : String(pushErr);
+      console.warn(`  #${issue.id}: git push failed after rebase abort: ${pushMsg}`);


+  if (!validationPassed && round < MAX_CRITIC_ROUNDS) {
+    const retryBudget = ITERATION_BUDGET[MAX_CRITIC_ROUNDS - 1] ?? 10;
+    console.log(
+      `  #${issue.id}: Retrying one more implement→critic round (budget: ${String(retryBudget)})`,


…om labels

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

+              converged = false;
+            } else {
+              converged = true;
+            }


    const validated = parsed.issues.filter(
-      (entry): entry is { branch: string; id: string; title: string } => {
-        if (typeof entry !== "object" || entry === null) {
-          console.warn("  Skipping non-object issue entry");
-          return false;
-        }
+      (entry): entry is { body: string; branch: string; id: string; title: string } => {
+        if (typeof entry !== "object" || entry === null) return false;
        const item = entry as Record<string, unknown>;
-        if (typeof item.id !== "string" || !/^\d+$/.test(item.id)) {
-          console.warn(`  Skipping issue with invalid id: ${String(item.id)}`);
-          return false;
-        }
-        if (typeof item.branch !== "string") {
-          console.warn("  Skipping issue with missing branch");
-          return false;
-        }
-        if (typeof item.title !== "string") {
-          console.warn("  Skipping issue with missing title");
-          return false;
-        }
-        if (!BRANCH_PATTERN.test(item.branch)) {
-          console.warn(`  Skipping issue with invalid branch: ${item.branch}`);
-          return false;
-        }
+        if (typeof item.id !== "string" || !/^\d+$/.test(item.id)) return false;
+        if (typeof item.branch !== "string" || !BRANCH_PATTERN.test(item.branch)) return false;
+        if (typeof item.title !== "string") return false;
        return true;
      },


+  const prTitle = `${commitPrefix}: resolve #${issue.id} — ${issue.title}`;
+  const prBody = `## Description\n\nAutomated fix for #${issue.id}: ${issue.title}\n\n## Type of Change\n\n- [x] Bug fix (non-breaking change that fixes an issue)\n\n## Checklist\n\n${validationCheck} I have run validation suite\n- [x] My changes follow the existing code style\n\n## Related Issues\n\nFixes #${issue.id}${outstandingNote}${validationNote}`;


Split main.ts (525 lines) into 6 self-contained modules: - types.ts: shared domain types (TaskSpec, Finding, LoopResult, FinalizeResult) - refinement-loop.ts: reusable implement↔critic loop engine - finalizer.ts: validation, rebase, PR creation - concurrency-pool.ts: semaphore utility - task-source.ts: TaskSource interface + GithubIssueSource - main.ts: 74-line thin orchestrator wiring all modules The refinement loop is now reusable by any task source (GitHub issues, CI failures, manual triggers) without coupling to the planner.

Copilot

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

+      if (nonLowFindings.length > 0) {
+        lastFindings = nonLowFindings;
+        status = "exhausted";
+      } else {
+        status = "converged";
+      }


+    try {
+      rawIssuesJson = execSync(
+        `gh issue list --state open --json number,title,labels,body --limit 50 --label "${this.label}"`,
+        { encoding: "utf-8" },
+      );


+      const validated = parsed.issues.filter(
+        (entry): entry is { body: string; branch: string; id: string; title: string } => {
+          if (typeof entry !== "object" || entry === null) return false;
+          const item = entry as Record<string, unknown>;
+          if (typeof item.id !== "string" || !/^\d+$/.test(item.id)) return false;
+          if (typeof item.branch !== "string" || !this.branchPattern.test(item.branch))
+            return false;
+          if (typeof item.title !== "string") return false;
+          return true;
+        },
+      );
+
+      return validated.map((v) => ({
+        ...v,
+        body: issuesJson.find((i) => String(i.number) === v.id)?.body ?? "",
+        labels: issuesJson.find((i) => String(i.number) === v.id)?.labels ?? [],
+      }));


+      return tasks;
+    }
+
+    return [];


Copilot

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 7 comments.

+    if (newFindings.length === 0) {
+      const nonLowFindings = findings.filter((f) => f.confidence !== "LOW");
+      if (nonLowFindings.length > 0) {
+        lastFindings = nonLowFindings;
+        status = "exhausted";
+      } else {
+        status = "converged";
+      }
+      break;


+      status = "exhausted";
+      break;


+Run `git diff main...{{BRANCH}}` to see all changes. Examine the diff carefully. For each issue found, produce a structured finding.
+


+      const validated = parsed.issues.filter(
+        (entry): entry is { body: string; branch: string; id: string; title: string } => {
+          if (typeof entry !== "object" || entry === null) return false;
+          const item = entry as Record<string, unknown>;
+          if (typeof item.id !== "string" || !/^\d+$/.test(item.id)) return false;
+          if (typeof item.branch !== "string" || !this.branchPattern.test(item.branch))
+            return false;
+          if (typeof item.title !== "string") return false;
+          return true;
+        },


+  private running = 0;
+
+  /**
+   * @param max - Maximum number of concurrent tasks.
+   */
+  constructor(private readonly max: number) {}
+


+  if (!validationPassed && loopResult.roundsCompleted < MAX_CRITIC_ROUNDS) {
+    const retryBudget = ITERATION_BUDGET[MAX_CRITIC_ROUNDS - 1] ?? 10;
+    console.log(
+      `  #${spec.id}: Retrying one more implement round (budget: ${String(retryBudget)})`,
+    );
+
+    try {
+      await sandbox.run({
+        agent: sandcastle.opencode("github-copilot/claude-sonnet-4.6"),
+        maxIterations: retryBudget,
+        name: `Implementer #${spec.id} retry`,
+        promptArgs: {
+          BRANCH: spec.branch,
+          FINDINGS:
+            loopResult.lastFindings.length > 0
+              ? JSON.stringify(loopResult.lastFindings, null, 2)
+              : "",
+          ISSUE_BODY: spec.body,
+          ISSUE_TITLE: spec.title,
+          TASK_ID: spec.id,
+        },
+        promptFile: "./.sandcastle/implement-prompt.md",
+      });
+    } catch (retryErr: unknown) {
+      const retryMsg = retryErr instanceof Error ? retryErr.message : String(retryErr);
+      console.warn(
+        `  #${spec.id}: Implementer retry threw: ${retryMsg}. Falling through to PR creation.`,
+      );
+    }
+
+    try {
+      execSync(VALIDATION_COMMAND, { cwd, stdio: "pipe" });
+      validationPassed = true;
+      console.log(`  #${spec.id}: Validation passed after retry round.`);
+    } catch {
+      console.warn(`  #${spec.id}: Validation still fails after retry. Will create draft PR.`);
+    }
+  }


+/**
+ * Strips injection-prone tags from text.
+ * @param text - Raw text to sanitize.
+ * @returns Sanitized text safe for prompt injection.
+ */
+function sanitizeForPrompt(text: string): string {
+  return text.replace(/<\/?(?:plan|findings[\w-]*|promise)[^>]*>/gi, "");
+}


Copilot

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 5 comments.

  const settled = await Promise.allSettled(
-    issues.map(async (issue) => {
-      await acquire();
-      try {
-        await using sandbox = await sandcastle.createSandbox({
-          branch: issue.branch,
-          copyToWorktree: ["node_modules"],
-          hooks: {
-            sandbox: { onSandboxReady: [{ command: "npm install && npm run build" }] },
-          },
-          sandbox: docker({ imageName: DOCKER_IMAGE }),
-        });
+    tasks.map((spec) =>
+      pool.run(() =>
+        Promise.race([
+          (async () => {
+            await using sandbox = await sandcastle.createSandbox({
+              branch: spec.branch,
+              copyToWorktree: ["node_modules"],
+              hooks: {
+                sandbox: { onSandboxReady: [{ command: "npm install && npm run build" }] },
+              },
+              sandbox: docker({ imageName: DOCKER_IMAGE }),
+            });

-        const result = await sandbox.run({
-          agent: sandcastle.opencode("github-copilot/claude-sonnet-4.6"),
-          maxIterations: 100,
-          name: "Implementer #" + issue.id,
-          promptArgs: {
-            BRANCH: issue.branch,
-            ISSUE_TITLE: issue.title,
-            TASK_ID: issue.id,
-          },
-          promptFile: "./.sandcastle/implement-prompt.md",
-        });
+            const loopResult = await runRefinementLoop(spec, sandbox, {
+              iterationBudget: ITERATION_BUDGET_PER_ROUND,
+              maxRounds: MAX_CRITIC_ROUNDS,
+            });

-        if (result.commits.length > 0) {
-          try {
-            await sandbox.run({
-              agent: sandcastle.opencode("github-copilot/claude-sonnet-4.6"),
-              maxIterations: 10,
-              name: "Reviewer #" + issue.id,
-              promptArgs: {
-                BRANCH: issue.branch,
-              },
-              promptFile: "./.sandcastle/review-prompt.md",
+            let prCreated = false;
+            if (loopResult.totalCommits > 0) {
+              const cwd = sandbox.worktreePath;
+              const result = await finalizeTask(spec, loopResult, sandbox, cwd);
+              prCreated = result.prCreated;
+            }
+
+            return { prCreated, spec };
+          })(),
+          (() => {
+            const p = new Promise<never>((_, reject) => {
+              setTimeout(() => {
+                reject(new Error(`Task #${spec.id} timed out after ${String(TASK_TIMEOUT_MS)}ms`));
+              }, TASK_TIMEOUT_MS).unref();
+            });
+            p.catch(() => {
+              /* suppress unhandled rejection when task completes before timeout */
            });
-          } catch (reviewError: unknown) {
-            const msg = reviewError instanceof Error ? reviewError.message : String(reviewError);
-            console.warn(`  Reviewer for #${issue.id} failed, proceeding unreviewed: ${msg}`);
-          }
-        }
+            return p;
+          })(),
+        ]),
+      ),


+          const source = issueMap.get(v.id);
+          if (!source) return null;
+          return {
+            ...v,


+function findingKey(f: Finding, cwd: string, fileCache?: Map<string, string>): string {
+  if (!f.file || f.line == null) {
+    const normalizedTitle = f.title
+      .toLowerCase()
+      .replace(/[^\w\s]/g, "")
+      .replace(/\s+/g, " ")
+      .trim();
+    const titleHash = crypto
+      .createHash("sha256")
+      .update(normalizedTitle)
+      .digest("hex")
+      .slice(0, 16);
+    return `${f.file || "global"}::${f.category}::${titleHash}`;
+  }
+  const contextHash = hashContextLines(cwd, f.file, f.line, 3, fileCache);
+  return `${f.file}::${f.category}::${contextHash}`;
+}


+  // Validate SHA format before passing to execFileSync
+  if (!/^[0-9a-f]{40}$/.test(beforeSha)) {
+    console.warn(`  #${spec.id}: Invalid SHA for rollback, skipping reset.`);
+    return true;
+  }


+function pushBranch(cwd: string, spec: TaskSpec, rebaseSucceeded: boolean): boolean {
+  if (rebaseSucceeded) {
+    try {
+      execFileSync("git", ["push", "--force-with-lease"], { cwd, stdio: "pipe" });
+      return true;
+    } catch (pushErr: unknown) {
+      const pushMsg = pushErr instanceof Error ? pushErr.message : String(pushErr);
+      try {
+        const suffix = crypto.randomBytes(4).toString("hex");
+        execFileSync("git", ["push", "origin", `HEAD:refs/heads/rescue/${spec.branch}-${suffix}`], {
+          cwd,
+          stdio: "pipe",
+        });
+        console.warn(
+          `  #${spec.id}: Push failed. Commits preserved at rescue/${spec.branch}-${suffix}`,
+        );
+      } catch {
+        console.error(
+          `  #${spec.id}: Push failed and rescue failed. Commits will be lost on sandbox disposal: ${pushMsg}`,
+        );
+      }
+      return false;
+    }


…menter crash)

…ilent false convergence)

Copilot

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 8 comments.

+function sanitizeForPrompt(text: string): string {
+  const normalized = text.normalize("NFKC");
+  return normalized.replace(
+    /<\/?(?:plan|findings|promise|system|code|instructions|implement|review|tool_call)[^>]*>/gi,


+/** Maximum implement↔critic rounds before giving up. */
+export const MAX_CRITIC_ROUNDS = 5;
+
+/**
+ * Flat iteration budget per round (intentionally constant, not decreasing).
+ * Evidence: ARCS (arXiv:2504.20434), SWE-Agent, AutoCodeRover all use flat budgets.
+ * Decreasing schedules penalize harder residual problems in later rounds.
+ */
+export const ITERATION_BUDGET_PER_ROUND = 50;


+/**
+ * Computes a deduplication key for a finding using a context hash of surrounding lines.
+ * @param f - Finding to compute a key for.
+ * @param cwd - Working directory (worktree path) for reading file context.
+ * @param fileCache - Optional cache of file contents keyed by resolved path.
+ * @returns Composite dedup key.
+ */
+function findingKey(f: Finding, cwd: string, fileCache?: Map<string, string>): string {
+  if (!f.file || f.line == null) {
+    const normalizedTitle = f.title
+      .toLowerCase()
+      .replace(/[^\w\s]/g, "")
+      .replace(/\s+/g, " ")
+      .trim();
+    const titleHash = crypto
+      .createHash("sha256")
+      .update(normalizedTitle)
+      .digest("hex")
+      .slice(0, 16);
+    return `${f.file || "global"}::${f.category}::${titleHash}`;
+  }
+  const contextHash = hashContextLines(cwd, f.file, f.line, 3, fileCache);
+  return `${f.file}::${f.category}::${contextHash}`;
+}


+function checkQualityRatchet(
+  spec: TaskSpec,
+  round: number,
+  findingsCount: number,
+  previousCount: number,
+  beforeSha: string,
+  cwd: string,
+): boolean {
+  if (round <= 2 || findingsCount <= previousCount) {
+    return false;
+  }
+
+  // Validate SHA format before passing to execFileSync
+  if (!/^[0-9a-f]{40}$/.test(beforeSha)) {
+    console.warn(`  #${spec.id}: Invalid SHA for rollback, skipping reset.`);
+    return true;
+  }
+
+  try {
+    execFileSync("git", ["reset", "--hard", beforeSha], {
+      cwd,
+      stdio: "pipe",
+    });
+    console.warn(
+      `  #${spec.id} R${String(round)}: Regression detected (${String(previousCount)} → ${String(findingsCount)}). Rolled back.`,
+    );
+  } catch {
+    console.warn(`  #${spec.id}: Failed to reset to ${beforeSha} after regression.`);
+  }
+
+  return true;
+}


+  const newFindings = findings.filter(
+    (f) => f.confidence !== "LOW" && !seenKeys.has(findingKey(f, cwd, fileCache)),
+  );
+  for (const f of newFindings) {
+    seenKeys.add(findingKey(f, cwd, fileCache));
+  }


  const settled = await Promise.allSettled(
-    issues.map(async (issue) => {
-      await acquire();
-      try {
-        await using sandbox = await sandcastle.createSandbox({
-          branch: issue.branch,
-          copyToWorktree: ["node_modules"],
-          hooks: {
-            sandbox: { onSandboxReady: [{ command: "npm install && npm run build" }] },
-          },
-          sandbox: docker({ imageName: DOCKER_IMAGE }),
-        });
+    tasks.map((spec) =>
+      pool.run(() =>
+        Promise.race([
+          (async () => {
+            await using sandbox = await sandcastle.createSandbox({
+              branch: spec.branch,
+              copyToWorktree: ["node_modules"],
+              hooks: {
+                sandbox: { onSandboxReady: [{ command: "npm install && npm run build" }] },
+              },
+              sandbox: docker({ imageName: DOCKER_IMAGE }),
+            });

-        const result = await sandbox.run({
-          agent: sandcastle.opencode("github-copilot/claude-sonnet-4.6"),
-          maxIterations: 100,
-          name: "Implementer #" + issue.id,
-          promptArgs: {
-            BRANCH: issue.branch,
-            ISSUE_TITLE: issue.title,
-            TASK_ID: issue.id,
-          },
-          promptFile: "./.sandcastle/implement-prompt.md",
-        });
+            const loopResult = await runRefinementLoop(spec, sandbox, {
+              iterationBudget: ITERATION_BUDGET_PER_ROUND,
+              maxRounds: MAX_CRITIC_ROUNDS,
+            });

-        if (result.commits.length > 0) {
-          try {
-            await sandbox.run({
-              agent: sandcastle.opencode("github-copilot/claude-sonnet-4.6"),
-              maxIterations: 10,
-              name: "Reviewer #" + issue.id,
-              promptArgs: {
-                BRANCH: issue.branch,
-              },
-              promptFile: "./.sandcastle/review-prompt.md",
+            let prCreated = false;
+            if (loopResult.totalCommits > 0) {
+              const cwd = sandbox.worktreePath;
+              const result = await finalizeTask(spec, loopResult, sandbox, cwd);
+              prCreated = result.prCreated;
+            }
+
+            return { prCreated, spec };
+          })(),
+          (() => {
+            const p = new Promise<never>((_, reject) => {
+              setTimeout(() => {
+                reject(new Error(`Task #${spec.id} timed out after ${String(TASK_TIMEOUT_MS)}ms`));
+              }, TASK_TIMEOUT_MS).unref();
+            });
+            p.catch(() => {
+              /* suppress unhandled rejection when task completes before timeout */
            });
-          } catch (reviewError: unknown) {
-            const msg = reviewError instanceof Error ? reviewError.message : String(reviewError);
-            console.warn(`  Reviewer for #${issue.id} failed, proceeding unreviewed: ${msg}`);
-          }
-        }
+            return p;
+          })(),
+        ]),


+function pushBranch(cwd: string, spec: TaskSpec, rebaseSucceeded: boolean): boolean {
+  if (rebaseSucceeded) {
+    try {
+      execFileSync("git", ["push", "--force-with-lease"], { cwd, stdio: "pipe" });
+      return true;
+    } catch (pushErr: unknown) {
+      const pushMsg = pushErr instanceof Error ? pushErr.message : String(pushErr);
+      try {
+        const suffix = crypto.randomBytes(4).toString("hex");
+        execFileSync("git", ["push", "origin", `HEAD:refs/heads/rescue/${spec.branch}-${suffix}`], {
+          cwd,
+          stdio: "pipe",
+        });
+        console.warn(
+          `  #${spec.id}: Push failed. Commits preserved at rescue/${spec.branch}-${suffix}`,
+        );
+      } catch {
+        console.error(
+          `  #${spec.id}: Push failed and rescue failed. Commits will be lost on sandbox disposal: ${pushMsg}`,
+        );
+      }
+      return false;
+    }
+  } else {
+    try {
+      execFileSync("git", ["push"], { cwd, stdio: "pipe" });
+      return true;
+    } catch (pushErr: unknown) {


+export type LoopStatus = "converged" | "exhausted" | "failed" | "skipped";
+
+/** Type alias for a sandcastle sandbox instance. */
+export type SandboxInstance = Awaited<ReturnType<typeof sandcastle.createSandbox>>;


…nHands) - Validation in-loop (ARCS): deterministic convergence when tests pass mid-loop - Best-state checkpoint (SWE-Agent): reset to best SHA on non-convergence - Severity-weighted convergence (OpenHands): refuse convergence if CRITICAL/HIGH persist

…/bestSha mismatch

…orrectness)

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

- New constants.ts: shared constants (VALIDATION_COMMAND, timeouts, model names) + utilities (getHeadSha, toErrorMessage) - refinement-loop.ts: decompose runRefinementLoop (CC 17→≤10), RoundContext/HashInput param objects, computeFindingKey rename - finalizer.ts: add timeouts to all execFileSync, use runValidation helper consistently - task-source.ts: add timeout, replace char loop with regex, fix terse names - main.ts: extract withTimeout helper, use model constants - types.ts: unexport FindingsSchema (internal only)

…nt audit findings)

Copilot

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 6 comments.

+export type LoopStatus = "converged" | "exhausted" | "failed" | "skipped";
+
+/** Type alias for a sandcastle sandbox instance. */
+export type SandboxInstance = Awaited<ReturnType<typeof sandcastle.createSandbox>>;


+      .update(`${file}:${String(line)}:${normalized}`)
+      .digest("hex")
+      .slice(0, HASH_PREFIX_LENGTH);
+  } catch {
+    return crypto
+      .createHash("sha256")
+      .update(`${file}:${String(line)}:fallback`)


+function parseFindings(stdout: string, nonce: string): Finding[] | null {
+  if (!/^[0-9a-f]+$/.test(nonce)) return null;
+  const tagPattern = new RegExp(`<findings-${nonce}>([\\s\\S]*?)<\\/findings-${nonce}>`, "g");
+  const matches = [...stdout.matchAll(tagPattern)];
+  if (matches.length === 0) return null;
+  // Find last non-trivial match
+  for (let i = matches.length - 1; i >= 0; i--) {
+    const raw = matches[i]?.[1]?.trim() ?? "";
+    if (raw.length < 2) continue;
+    const cleaned = raw.replace(/^```(?:json)?\s*\n?/g, "").replace(/\n?```\s*$/g, "");
+    try {
+      return parseFindingsSafe(JSON.parse(cleaned));
+    } catch {
+      continue;
+    }
+  }
+  return null;


+  // Validate SHA format before passing to execFileSync
+  if (!/^[0-9a-f]{40}$/.test(beforeSha)) {
+    console.warn(`  #${spec.id}: Invalid SHA for rollback, skipping reset.`);
+    return true;
+  }


+function pushBranch(cwd: string, spec: TaskSpec, rebaseSucceeded: boolean): boolean {
+  if (rebaseSucceeded) {
+    try {
+      execFileSync("git", ["push", "--force-with-lease"], {
+        cwd,
+        stdio: "pipe",
+        timeout: PUSH_TIMEOUT_MS,
+      });
+      return true;
+    } catch (pushErr: unknown) {
+      const pushMsg = toErrorMessage(pushErr);
+      try {
+        const suffix = crypto.randomBytes(4).toString("hex");
+        execFileSync("git", ["push", "origin", `HEAD:refs/heads/rescue/${spec.branch}-${suffix}`], {
+          cwd,
+          stdio: "pipe",
+          timeout: PUSH_TIMEOUT_MS,
+        });
+        console.warn(
+          `  #${spec.id}: Push failed. Commits preserved at rescue/${spec.branch}-${suffix}`,
+        );
+      } catch {
+        console.error(
+          `  #${spec.id}: Push failed and rescue failed. Commits will be lost on sandbox disposal: ${pushMsg}`,
+        );
+      }
+      return false;


+    if (result.findings === null) break;
+    const findings: Finding[] = result.findings;
+
+    if (result.commits > 0 && runMidLoopValidation(sandbox.worktreePath)) {
+      totalCommits += result.commits;
+      status = "converged";
+      break;
+    }


…ions, scope preference

…ck event loop) Replace all blocking execFileSync calls with util.promisify(execFile) to enable true parallelism between tasks during subprocess execution. - constants.ts: add execFileAsync export, convert getHeadSha to async - refinement-loop.ts: captureHeadSha, checkQualityRatchet, checkConvergence, runMidLoopValidation, resetToBestState all async - finalizer.ts: runValidation, attemptRebase, pushBranch all async - task-source.ts: fetchAndSanitizeIssues async readFileSync/realpathSync stay sync (<1ms local I/O, no benefit from async). maxBuffer: 8MB added to validation and gh issue list calls.

Copilot

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 6 comments.

+/** Type alias for a sandcastle sandbox instance. */
+export type SandboxInstance = Awaited<ReturnType<typeof sandcastle.createSandbox>>;


+/**
+ * Flat iteration budget per round (intentionally constant, not decreasing).
+ * Evidence: ARCS (arXiv:2504.20434), SWE-Agent, AutoCodeRover all use flat budgets.
+ * Decreasing schedules penalize harder residual problems in later rounds.
+ */
+export const ITERATION_BUDGET_PER_ROUND = 50;


+    if (newFindings.length < bestFindingsCount) {
+      bestFindingsCount = newFindings.length;
+      bestSha = await captureHeadSha(cwd);
+    }
+
+    totalCommits += result.commits;
+    previousFindingsCount = nonLowFindings.length;
+    onRoundComplete(round, findings);
+
+    const convergenceResult = await checkConvergence(cwd, findings, newFindings, nonLowFindings);
+    if (convergenceResult !== null) {
+      lastFindings = convergenceResult.lastFindings;
+      status = convergenceResult.status;
+      bestSha = convergenceResult.bestSha;
+      break;
+    }
+
+    lastFindings = newFindings;
+  }
+
+  if (shouldResetToBest(status, bestSha)) {
+    totalCommits = await resetToBestState(sandbox.worktreePath, bestSha, totalCommits);
+  }
+
+  return { lastFindings, roundsCompleted, status, totalCommits };


+  // Implementer
+  let implementerResult: Awaited<ReturnType<typeof sandbox.run>>;
+  try {
+    implementerResult = await sandbox.run({
+      agent: sandcastle.opencode(AGENT_MODEL),
+      maxIterations: budget,
+      name: `Implementer #${spec.id} R${String(round)}`,
+      promptArgs: {
+        BRANCH: spec.branch,
+        FINDINGS: findingsArg,
+        ISSUE_BODY: spec.body,
+        ISSUE_TITLE: spec.title,
+        TASK_ID: spec.id,
+      },
+      promptFile: "./.sandcastle/implement-prompt.md",
+    });
+  } catch (err: unknown) {
+    const msg = err instanceof Error ? (err.stack ?? err.message) : String(err);
+    console.error(`  #${spec.id} R${String(round)}: Implementer threw: ${msg}`);
+    return { beforeSha, commits: 0, findings: null };
+  }
+
+  // Critic
+  const nonce = crypto.randomBytes(4).toString("hex");
+  let findings: Finding[] | null;
+  try {
+    findings = await runCritic(sandbox, spec, round, nonce);
+  } catch (err: unknown) {
+    const msg = err instanceof Error ? err.message : String(err);
+    console.error(`  #${spec.id} R${String(round)}: Critic threw: ${msg}`);
+    findings = null;
+  }
+
+  return { beforeSha, commits: implementerResult.commits.length, findings };


+    return { status: "skipped", totalCommits };
+  }
+  if (result.findings === null) {
+    console.warn(`  #${spec.id}: Critic failed twice. Breaking (non-converged).`);


+async function pushBranch(cwd: string, spec: TaskSpec, rebaseSucceeded: boolean): Promise<boolean> {
+  if (rebaseSucceeded) {
+    try {
+      await execFileAsync("git", ["push", "--force-with-lease"], {
+        cwd,
+        timeout: PUSH_TIMEOUT_MS,
+      });
+      return true;
+    } catch (pushErr: unknown) {
+      const pushMsg = toErrorMessage(pushErr);
+      try {
+        const suffix = crypto.randomBytes(4).toString("hex");
+        await execFileAsync(
+          "git",
+          ["push", "origin", `HEAD:refs/heads/rescue/${spec.branch}-${suffix}`],
+          {
+            cwd,
+            timeout: PUSH_TIMEOUT_MS,
+          },
+        );
+        console.warn(
+          `  #${spec.id}: Push failed. Commits preserved at rescue/${spec.branch}-${suffix}`,
+        );
+      } catch {
+        console.error(
+          `  #${spec.id}: Push failed and rescue failed. Commits will be lost on sandbox disposal: ${pushMsg}`,
+        );
+      }
+      return false;
+    }


…atch type annotations

Copilot

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 6 comments.

+export type LoopStatus = "converged" | "exhausted" | "failed" | "skipped";
+
+/** Type alias for a sandcastle sandbox instance. */
+export type SandboxInstance = Awaited<ReturnType<typeof sandcastle.createSandbox>>;


+  if (round === 1 && result.commits === 0) {
+    console.warn(`  #${spec.id}: 0 commits on round 1. Skipping.`);
+    return { status: "skipped", totalCommits };
+  }
+  if (result.findings === null) {
+    console.warn(`  #${spec.id}: Critic failed twice. Breaking (non-converged).`);
+    return { status: "failed", totalCommits: totalCommits + result.commits };
+  }


+    if (newFindings.length < bestFindingsCount) {
+      bestFindingsCount = newFindings.length;


+
+    if (result.commits > 0 && (await runMidLoopValidation(sandbox.worktreePath))) {
+      totalCommits += result.commits;
+      status = "converged";


+  validationPassed: boolean,
+  rebaseSucceeded: boolean,
+): { isDraft: boolean; prArgs: string[] } {
+  const converged = loopResult.status === "converged";


+    try {
+      await execFileAsync("git", ["push", "--force-with-lease"], {
+        cwd,
+        timeout: PUSH_TIMEOUT_MS,
+      });


Copilot AI review requested due to automatic review settings May 4, 2026 23:26

Copilot started reviewing on behalf of jerome-benoit May 4, 2026 23:26 View session

Copilot AI reviewed May 4, 2026

View reviewed changes

jerome-benoit added 2 commits May 5, 2026 01:46

fix: guard retry calls, split rebase logic, remove dead critic retry

d7cd5e2

Copilot AI review requested due to automatic review settings May 4, 2026 23:52

Copilot started reviewing on behalf of jerome-benoit May 4, 2026 23:53 View session

Copilot AI reviewed May 4, 2026

View reviewed changes

jerome-benoit added 2 commits May 5, 2026 01:57

fix: handle nullable issue body and guard JSON parse

bfad1d9

fix: distinguish stalled from converged (re-reported findings → draft…

e6b613a

… PR)

Copilot AI review requested due to automatic review settings May 5, 2026 00:02

Copilot started reviewing on behalf of jerome-benoit May 5, 2026 00:02 View session

Copilot AI reviewed May 5, 2026

View reviewed changes

fix: LOW-only findings should not prevent convergence

5901fbf

Copilot AI review requested due to automatic review settings May 5, 2026 00:17

Copilot started reviewing on behalf of jerome-benoit May 5, 2026 00:17 View session

fix: log validation errors, conditional checklist, derive PR title fr…

ee43546

…om labels

jerome-benoit force-pushed the feat/sandcastle-refinement-loop branch from f489cea to ee43546 Compare May 5, 2026 00:18

Copilot AI reviewed May 5, 2026

View reviewed changes

jerome-benoit added 2 commits May 5, 2026 13:38

fix: centralize constants, fix PR type-of-change, sanitize titles

3215748

Copilot AI review requested due to automatic review settings May 5, 2026 12:00

Copilot started reviewing on behalf of jerome-benoit May 5, 2026 12:01 View session

Copilot AI reviewed May 5, 2026

View reviewed changes

jerome-benoit added 2 commits May 5, 2026 14:06

fix: full validation post-rebase, execFileSync for gh issue list

25e85de

perf: skip critic when implementer produces 0 commits on round 2+

4f83a6c

Copilot AI review requested due to automatic review settings May 5, 2026 12:12

Copilot started reviewing on behalf of jerome-benoit May 5, 2026 12:13 View session

Copilot AI reviewed May 5, 2026

View reviewed changes

Copilot started reviewing on behalf of jerome-benoit May 5, 2026 16:15 View session

Copilot AI reviewed May 5, 2026

View reviewed changes

jerome-benoit added 2 commits May 5, 2026 18:18

fix: check findings null before commits zero (correct status on imple…

29c2ce6

…menter crash)

fix: report known findings in PR body even when converged (prevents s…

a4f323d

…ilent false convergence)

Copilot AI review requested due to automatic review settings May 5, 2026 16:30

Copilot started reviewing on behalf of jerome-benoit May 5, 2026 16:32 View session

Copilot AI reviewed May 5, 2026

View reviewed changes

jerome-benoit added 2 commits May 5, 2026 18:42

fix: move bestSha after ratchet, add validation timeout, fix severity…

2f8cbf7

…/bestSha mismatch

Copilot AI review requested due to automatic review settings May 5, 2026 16:50

fix: recount totalCommits from git after best-state reset (semantic c…

5ef3725

…orrectness)

Copilot AI reviewed May 5, 2026

View reviewed changes

jerome-benoit added 2 commits May 5, 2026 20:19

fix: use constants from constants.ts + add planner timeout (multi-age…

69ca4fa

…nt audit findings)

Copilot AI review requested due to automatic review settings May 5, 2026 18:40

Copilot started reviewing on behalf of jerome-benoit May 5, 2026 18:41 View session

Copilot AI reviewed May 5, 2026

View reviewed changes

jerome-benoit added 2 commits May 5, 2026 21:51

refactor(.sandcastle): harden prompts — cap findings, add known decis…

d8b3454

…ions, scope preference

Copilot AI review requested due to automatic review settings May 5, 2026 20:26

Copilot started reviewing on behalf of jerome-benoit May 5, 2026 20:27 View session

Copilot AI reviewed May 5, 2026

View reviewed changes

fix: catch planner timeout rejection (retry instead of crash) + add c…

96bbd80

…atch type annotations

jerome-benoit changed the title ~~feat: implement sandcastle refinement loop with critic-based convergence~~ feat: sandcastle refinement loop with critic-based convergence May 5, 2026

jerome-benoit mentioned this pull request May 5, 2026

[Feature Request]: Implement/review refinement loop with deterministic convergence #110

Closed

Merge branch 'main' into feat/sandcastle-refinement-loop

7954fec

Copilot AI review requested due to automatic review settings May 5, 2026 20:40

Copilot started reviewing on behalf of jerome-benoit May 5, 2026 20:41 View session

jerome-benoit merged commit b6b7db6 into main May 5, 2026
9 checks passed

Copilot AI reviewed May 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: sandcastle refinement loop with critic-based convergence#111

feat: sandcastle refinement loop with critic-based convergence#111
jerome-benoit merged 32 commits into
mainfrom
feat/sandcastle-refinement-loop

jerome-benoit commented May 4, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -115,99 +180,186 @@ for (let iteration = 1; iteration <= MAX_PLANNER_RETRIES; iteration++) {
		sandbox: docker({ imageName: DOCKER_IMAGE }),
		});


		const prTitle = `fix: resolve #${issue.id} — ${issue.title}`;
		const prBody = `## Description\n\nAutomated fix for #${issue.id}: ${issue.title}\n\n## Type of Change\n\n- [x] Bug fix (non-breaking change that fixes an issue)\n\n## Checklist\n\n- [x] I have run validation suite\n- [x] My changes follow the existing code style\n\n## Related Issues\n\nFixes #${issue.id}${outstandingNote}${validationNote}`;

		const prTitle = `${commitPrefix}: resolve #${issue.id} — ${issue.title}`;
		const prBody = `## Description\n\nAutomated fix for #${issue.id}: ${issue.title}\n\n## Type of Change\n\n- [x] Bug fix (non-breaking change that fixes an issue)\n\n## Checklist\n\n${validationCheck} I have run validation suite\n- [x] My changes follow the existing code style\n\n## Related Issues\n\nFixes #${issue.id}${outstandingNote}${validationNote}`;

		Run `git diff main...{{BRANCH}}` to see all changes. Examine the diff carefully. For each issue found, produce a structured finding.

		/** Type alias for a sandcastle sandbox instance. */
		export type SandboxInstance = Awaited<ReturnType<typeof sandcastle.createSandbox>>;

		if (newFindings.length < bestFindingsCount) {
		bestFindingsCount = newFindings.length;

Conversation

jerome-benoit commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Architecture

Key Design Decisions

Modules

Prompts

Type of Change

Checklist

Related Issues

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jerome-benoit commented May 4, 2026 •

edited

Loading